14 research outputs found

    Accurate and budget-efficient text, image, and video analysis systems powered by the crowd

    Full text link
    Crowdsourcing systems empower individuals and companies to outsource labor-intensive tasks that cannot currently be solved by automated methods and are expensive to tackle by domain experts. Crowdsourcing platforms are traditionally used to provide training labels for supervised machine learning algorithms. Crowdsourced tasks are distributed among internet workers who typically have a range of skills and knowledge, differing previous exposure to the task at hand, and biases that may influence their work. This inhomogeneity of the workforce makes the design of accurate and efficient crowdsourcing systems challenging. This dissertation presents solutions to improve existing crowdsourcing systems in terms of accuracy and efficiency. It explores crowdsourcing tasks in two application areas, political discourse and annotation of biomedical and everyday images. The first part of the dissertation investigates how workers' behavioral factors and their unfamiliarity with data can be leveraged by crowdsourcing systems to control quality. Through studies that involve familiar and unfamiliar image content, the thesis demonstrates the benefit of explicitly accounting for a worker's familiarity with the data when designing annotation systems powered by the crowd. The thesis next presents Crowd-O-Meter, a system that automatically predicts the vulnerability of crowd workers to believe \enquote{fake news} in text and video. The second part of the dissertation explores the reversed relationship between machine learning and crowdsourcing by incorporating machine learning techniques for quality control of crowdsourced end products. In particular, it investigates if machine learning can be used to improve the quality of crowdsourced results and also consider budget constraints. The thesis proposes an image analysis system called ICORD that utilizes behavioral cues of the crowd worker, augmented by automated evaluation of image features, to infer the quality of a worker-drawn outline of a cell in a microscope image dynamically. ICORD determines the need to seek additional annotations from other workers in a budget-efficient manner. Next, the thesis proposes a budget-efficient machine learning system that uses fewer workers to analyze easy-to-label data and more workers for data that require extra scrutiny. The system learns a mapping from data features to number of allocated crowd workers for two case studies, sentiment analysis of twitter messages and segmentation of biomedical images. Finally, the thesis uncovers the potential for design of hybrid crowd-algorithm methods by describing an interactive system for cell tracking in time-lapse microscopy videos, based on a prediction model that determines when automated cell tracking algorithms fail and human interaction is needed to ensure accurate tracking

    BUOCA: Budget-Optimized Crowd Worker Allocation

    Full text link
    Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing.First author draf

    Salient object subitizing

    Full text link
    We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA. (0910908 - US NSF; 1029430 - US NSF)https://arxiv.org/abs/1607.07525https://arxiv.org/pdf/1607.07525.pdfAccepted manuscrip

    BUOCA: Budget-Optimized Crowd Worker Allocation

    Full text link
    Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing

    Explainable Deep Classification Models for Domain Generalization

    Full text link
    Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible accuracy degradation. Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. This is represented in the form of a saliency map conveying how much each pixel contributed to the network's decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object. We quantify explainability using an automated metric, and using human judgement. We propose explainability as a means for bridging the visual-semantic gap between different domains where model explanations are used as a means of disentagling domain specific information from otherwise relevant features. We demonstrate that this leads to improved generalization to new domains without hindering performance on the original domain

    Predicting Quality of Crowdsourced Image Segmentations from Crowd Behavior

    No full text
    Quality control (QC) is an integral part of many crowd- sourcing systems. However, popular QC methods, such as aggregating multiple annotations, filtering workers, or verifying the quality of crowd work, introduce additional costs and delays. We propose a complementary paradigm to these QC methods based on predicting the quality of submitted crowd work. In particular, we pro- pose to predict the quality of a given crowd drawing directly from a crowd worker’s drawing time, number of user clicks, and average time per user click. We focus on the task of drawing the boundary of a single object in an image. To train and test our prediction models, we collected a total of 2,025 crowd-drawn segmentations for 405 familiar everyday images and unfamiliar biomedical images from 90 unique crowd workers. We first evaluated five prediction models learned using different combinations of the three worker behavior cues for all images. Experiments revealed that time per number of user clicks was the most effective cue for predicting segmentation quality. We next inspected the predictive power of models learned using crowd annotations collected for familiar and unfamiliar data independently. Prediction models were significantly more effective for estimating the segmentation quality from crowd worker behavior for familiar image content than unfamiliar image content

    Investigating the Influence of Data Familiarity to Improve the Design of a Crowdsourcing Image Annotation System

    No full text
    Crowdsourced demarcations of object boundaries in images (segmentations) are important for many vision-based applications. A commonly reported challenge is that a large percentage of crowd results are discarded due to concerns about quality. We conducted three studies to examine (1) how does the quality of crowdsourced segmentations differ for familiar everyday images versus unfamiliar biomedical images?, (2) how does making familiar images less recognizable (rotating images upside down) influence crowd work with respect to the quality of results, segmentation time, and segmentation detail?, and (3) how does crowd workers’ judgments of the ambiguity of the segmentation task, collected by voting, differ for familiar everyday images and unfamiliar biomedical images? We analyzed a total of 2,525 segmentations collected from 121 crowd workers and 1,850 votes from 55 crowd workers. Our results illustrate the potential benefit of explicitly accounting for human familiarity with the data when designing computer interfaces for human interaction

    Rigorously Collecting Commonsense Judgments for Complex Question-Answer Content

    No full text
    Community Question Answering (CQA) websites are a popular tool for internet users to fulfill diverse information needs. Posted questions can be multiple sentences long and span diverse domains. They go beyond factoid questions and can be conversational, opinion-seeking and experiential questions, that might have multiple, potentially conflicting, useful answers from different users. In this paper, we describe a large-scale formative study to collect commonsense properties of questions and answers from 18 diverse communities from stackexchange.com. We collected 50,000 human judgments on 500 question-answer pairs. Commonsense properties are features that humans can extract and characterize reliably by using their commonsense knowledge and native language skills, and no special domain expertise is assumed. We report results and suggestions for designing human computation tasks for collecting commonsense semantic judgments

    Crowd-O-Meter: Predicting if a Person Is Vulnerable to Believe Political Claims

    No full text
    Social media platforms have been criticized for promoting false information during the 2016 U.S. presidential election campaign. Our work is motivated by the idea that a platform could reduce the circulation of false information if it could estimate whether its users are vulnerable to believing political claims. We here explore whether such a vulnerability could be measured in a crowdsourcing setting. We propose Crowd-O-Meter, a framework that automatically predicts if a crowd worker will be consistent in his/her beliefs about political claims; i.e., consistently believes the claims are true or consistently believes the claims are not true. Crowd-O-Meter is a user-centered approach which interprets a combination of cues characterizing the user's implicit and explicit opinion bias. Experiments on 580 quotes from PolitiFact's fact checking corpus of 2016 U.S. presidential candidates show that Crowd-O-Meter is precise and accurate for two news modalities: text and video. Our analysis also reveals which are the most informative cues of a person's vulnerability
    corecore